Radiomics workflow definition & challenges - German priority program 2177 consensus statement on clinically applied radiomics

Objectives Achieving a consensus on a definition for different aspects of radiomics workflows to support their translation into clinical usage. Furthermore, to assess the perspective of experts on important challenges for a successful clinical workflow implementation. Materials and methods The consensus was achieved by a multi-stage process. Stage 1 comprised a definition screening, a retrospective analysis with semantic mapping of terms found in 22 workflow definitions, and the compilation of an initial baseline definition. Stages 2 and 3 consisted of a Delphi process with over 45 experts hailing from sites participating in the German Research Foundation (DFG) Priority Program 2177. Stage 2 aimed to achieve a broad consensus for a definition proposal, while stage 3 identified the importance of translational challenges. Results Workflow definitions from 22 publications (published 2012–2020) were analyzed. Sixty-nine definition terms were extracted, mapped, and semantic ambiguities (e.g., homonymous and synonymous terms) were identified and resolved. The consensus definition was developed via a Delphi process. The final definition comprising seven phases and 37 aspects reached a high overall consensus (> 89% of experts “agree” or “strongly agree”). Two aspects reached no strong consensus. In addition, the Delphi process identified and characterized from the participating experts’ perspective the ten most important challenges in radiomics workflows. Conclusion To overcome semantic inconsistencies between existing definitions and offer a well-defined, broad, referenceable terminology, a consensus workflow definition for radiomics-based setups and a terms mapping to existing literature was compiled. Moreover, the most relevant challenges towards clinical application were characterized. Critical relevance statement Lack of standardization represents one major obstacle to successful clinical translation of radiomics. Here, we report a consensus workflow definition on different aspects of radiomics studies and highlight important challenges to advance the clinical adoption of radiomics. Key Points Published radiomics workflow terminologies are inconsistent, hindering standardization and translation. A consensus radiomics workflow definition proposal with high agreement was developed. Publicly available result resources for further exploitation by the scientific community. Graphical Abstract


Table instructions
Purpose of the provided table is to illustrate the terms, found in the analyzed literature, and their relation.Further this table documents which term was used to represent a certain definition for the Delphi process and the consensus term that was finally chosen for a definition.

How to read the table
The first four columns (heading "Results of definition screening") are containing terms extracted from the analyzed literature.The first three columns (heading "Terms in literature") are needed to depict the identified hierarchies between the terms.A cell of a higher order column can span over multiple rows of the lower order column (e.g.column 1 over column 2).This indicates that all rows of the lower order column contain sub terms.E.g. the "Image processing" cell spans over multiple rows of the 2 nd column which indicates that e.g."Data conversion" and "Interpolation" are sub terms of "Image processing" (see figure on the left).In the same way it is indicated that "ROI interpolation" is a sub term of "Interpolation" (whose cell spans over the "ROI interpolation" row).The fourth column (heading "Alternative terms") contains all found synonyms for the term on the left side of the same row (e.g."Preprocessing" is one found synonym for "Image processing"; see figure on the left).
The last two columns (heading "Delphi process") indicate which terms were used in the therapy process after the analyzes.The fifth column (heading "Proposal (before Delphi)") shows the terms that are used at the beginning of the process.The sixth column (heading "Workflow consensus (after Delphi)") shows the respective consensus terms after the Delphi process.
Remark 1: The rows are ordered according to the analysis process that constructed the tree of step terms.Therefore, the order of the last column ("Workflow consensus (after Delphi)") does not represent the structure/organization (phases and their aspects) of the consensus definition.
Remark 2: There is no dedicated meaning for the selection of which terms should be in the first three columns and which terms are declared their respective synonyms, as all synonymous terms are equivalent.Choice of imaging protocol [2] #3

Choice of imaging protocol
Choice of imaging data Choice of prediction target [2] Choice of prediction target

Choice of prediction target
Choice of volume of interest [2] ROI definition [4] #1 Choice of volume of interest

Image geometry harmonization and resampling
(Merged into / unified as) ROI interpolation [7,17] Image geometry harmonization (post segmentation) Image geometry harmonization and resampling

Dimensionality Reduction
Exploratory analysis [2] Exploratory analysis Exploratory analysis Choice of modeling methodology [2] Choice of modeling methodology

Splitted into
Definition of the analysis and modeling strategy and Adaption of the analysis and modeling strategy Model building [12,14,19] #2 Creation and application of the radiomics model (original German term "Erstellung und Anwendung des Radiomics-Modells") [5] Model development (original German term "Modellentwicklung") [5] Model construction [22] Model training [18] Radiomics signature modeling [20] Multivariate Analysis and Model Building [6] Model building Model building Classification [15,22] Classification Merged into Modeling as it is just one of several specific task types (e.g. also detection) Validation [2,3,9,12,16,22] Model analysis [20] Performance evaluation [12] Validation Testing Validation Testing Reporting open-access scientific data [2] Reporting Reporting Clinical application of radiomics [7] Deployment (original German term "Bereitstellung") [5] Out of scope Out of scope Prospective evaluation of model [7] Out of scope Out of scope Personalized treatment [7] Out of scope Out of scope Radiomics signature [4] Out of scope Out of scope Legend: (x): Reference to the publication using that term #1: Term involved in a conflict of type "Homonym" #2: Term involved in a conflict of type "Hierarchy conflict" #3: Term involved in a conflict of type "Semantic ambiguity"

Terminology conflicts
Explanation of all conflicts found while screening the workflow definitions in literature.
Synonyms: This type of conflict is indicated if a cell in column "Alternative names in literature" is not empty; each alternative term, not indicated as another type of conflict, is a synonym.The cell then contains all found synonyms for the respective term in the same row (e.g. term "Choice of volume of interest" has the synonymous term "ROI definition").Remark: Synonyms are regarded as conflicts in this context, as it makes the "interoperability" between different publications harder as the reader has to translate between different terms.
Occurrences: 55 (including 9 synonymous usages that are also involved in other conflicts) Homonyms: Homonyms were found when identically named steps were defined differently.This type of conflict is indicated by a term being present in multiple rows of the first 4 columns ("Result definition screening" columns). Occurrences: 1. ROI Extraction: Used as synonym for "Segmentation" [5] and as its own term [1].
2. Preprocessing: Used as synonym for "Image processing" [16] and as its own term [12,13] as a sub step of "Feature extraction".
3. ROI Definition: Used as synonym for "Choice of volume of interest" [4] and as synonym for "Segmentation" [6].
Hierarchy conflicts: This type of conflict is a subclass of Homonyms.They occurred when a step was mentioned as a step in one publication, while it was a sub step in another publication.It is indicated by the same term occurring on multiple levels (nth order sub steps) of the same step term. Occurrences: 1. Model building: Used as synonym for "Modelling" [9] and as its sub step [12,14,19].
Semantic ambiguity: Semantic ambiguities occurred where definitions of a publication could not be clearly assigned to one step, but to multiple main steps. Occurrences: 1. Choice of imaging protocol [2]: Could be partly a sub step of "Data selection" (planning part) and partly of "Data acquisition" (execution).2. Image processing [1]: Could be partly sub step "Data conversion" [17] or sub step "Image post-acquisition processing" [17].

Terminology
• Workflow: A workflow is structured in phases and their aspects and comprises any activity/step, to plan, to conduct and to report the building of an image feature-based prediction model.• Phase: Phases represent different fundamental workflow steps and therefore can be found to a certain extent in every Radiomics workflow.A phase may contain one or more aspects.Between phases there is a logical dependency and therefore the order is not arbitrary.• Aspects: Aspects are activities that take place within a phase.Aspects are often optional and they have per se no fixed order (sorting in this document is alphabetically by the English name of the aspect).As an effect of this ambiguity, the literature partially strongly differs on the aspects and their sequence.In some cases aspects are arranged in hierarchies (e.g. because we found sub aspects in the literature).

Remarks
• Mandatory/optional workflow elements: As a default, aspects of a workflow can be regarded as optional (in occurrence, order or number).The list below compiles aspects that are documented in literature and are commonly found in Radiomics workflows.Nevertheless, we think that some aspects are crucial in the Radiomics workflow to ensure the validity and reliability of its results.Those "mandatory" aspects are indicated in the column "Mandatory".Workflow phases that have at least one mandatory aspect are also mandatory.• Machine learning / Deep learning: Deep learning techniques are increasingly applied also in the context of Radiomics workflows.They are (potentially) applicable at many aspects of the workflow; from simply replacing single aspects (like doing the annotation) up to replacing large parts of the workflow (e.g.end-to-end approaches).Therefore, the usage of deep learning techniques in the context of the workflow definition is not represented by additional optional aspects (which would be highly redundant) but by indicating which aspects can be replaced/covered by a deep learning technique (indicated by the column "ML").

X
Choice of imaging data Definition der Bildgebungsdaten Definition of image data / standardization of the imaging protocol to ensure the feasibility and reproducibility of the analysis.

X
Choice of prediction target Definition des Prädiktionsziels Definition of the prediction goal of the model.(e.g.stratification with respect to progression-free survival).

X
Choice of region of interest Definition der Zielstruktur Definition of the structures (ROI) to be analyzed incl.the segmentation protocol.

X
Definition of further data (nonimaging) Definition weiterer Daten (keine Bildgebungsdaten) Specification of the non-imaging data used for modeling.The data can be both modeling features and data on relevant endpoints.Due to the often low standardization of many nonimaging data, the formats and terminologies used should also be defined in advance and the widest possible/established options for coding should be used.

Definition of the analysis and modeling strategy
Definition der Analysestrategie Definition of the analysis and modeling strategy to answer the defined research question with the selected data.

Definition of the clinical added value or the expected benefit
Definitionen des klinischen Mehrwertes (Motivation) und des erwarteten Nutzens Definition of the added clinical value or the expected benefit, which should be achieved by the created model.

Image acquisition Bildaufnahme
This aspect refers solely to the image acquisition and the associated acquisition parameters.
Phantom studies Phantomstudien Use of phantom studies to calibrate imaging systems for a prospective study; especially for multi-center studies.Furthermore, phantom studies can be used to investigate differences between scanners and segmentation methods (inter-observer variability).

Reconstruction Rekonstruktion
Use of a reconstruction algorithm to reconstruct the image volume from the raw data.

X
Test-retest imaging Test-Retest-Bildgebung Experimental reproducibility assessment by repeating recordings with temporal delay to detect normal variations in the image signal (test retest).

Data management Datenmanagement
This phase contains all the actions necessary to compile the study data for the analysis and make it available for processing in the radiomics pipeline.

X
Data archiving Datenarchivierung Archiving / Storage of data for potential re-analysis, subsequent validation or further research.
FAIR principles should be regarded/supported by the chosen archiving strategy.

Data format conversion
Datenformat-Konvertierung Conversion of the data into other data formats (e.g. from DICOM to NIFTI).This is solely the transformation of the format (in the case of a lossless conversion).The conversion of the actual data takes place in the "Data conversion" aspect in the "Image processing and segmentation" phase.

Data transfer and import Datentransfer und -import
Transfer and import of the data into a target system which is required for the execution of the workflow (e.g. the evaluation is not conducted in the same facility or is conducted in a nonintegrated system).
Ethics and data protection Ethik und Datenschutz Display of ethical vote, detailing on data protection means (anonymisation/ pseudonymisation) also for secondary use of the employed data.

X
Export of Imaging Data Export der Bilddaten Export of Imaging Data (e.g.DICOM images) from the data archive (e.g.PACS) to be able to use them in the Radiomics pipeline.

X
Multi disciplinary data curation and integration

Multidisziplinäre Datenkuratierung und -Integration
Optional inclusion of non image data (e.g.clinical data, genetic data) that should be used for the modeling.

Record linkage (of multi disciplinary data)
Verknüpfung der Datensätze (multidisziplinär) Linking/merging of data (from different primary sources; e.g.multidisciplinary data with different IDs) of a subject.

Bildverarbeitung und Segmentierung
This phase contains all actions necessary to create the segmentations and prepare images as well as segmentations for feature calculation.

Data conversion Datenkonvertierung
Conversion of the image signal (image data) into another representation (e.g.conversion of PET signal image into Standardized Uptake Values (SUV)).

X
Image filtering Bildfilterung Processing of the image signal with filters (e.g.noise reduction, gray value normalization,...).

X
Image geometry harmonization and resampling

Harmonisierung der Bildgeometrie und Resampling
Step to convert all images in the evaluation into an identical image geometry, in order to make voxel size-dependent features comparable.The harmonization can be done before the segmentation (the segmentation is thus performed on the harmonized geometries) or after the segmentation (The segmentation must therefore also be harmonized and resegmentation may be necessary).

X
Image registration Bildregistrierung Transfer of images to target geometries with a given mapping rule (e.g. to compensate for motion artifacts, to spatially align multimodal images or to normalize to a reference anatomy).

X
Quality control of segmentation Qualitätskontrolle der Segmentierung Checking and correction of segmentation (especially at its edges) to correct errors that were e.g.introduced by the segmentation or by "harmonization of image geometry" (post segmentation).

X
Segmentation/annotation Segmentierung/Annotation Segmentation/annotation of the defined region of interest based on the defined protocol.

Image quality assessment Qualitätsbewertung der Bilddaten
The analysis of outliers is used to evaluate the quality of the used image material.For the assessment general criteria (e.g.correct modality or right body part) as well as study specific criteria (e.g.need minimal resolution, absence of artifacts) can be used.The assessment can be implemented by expert decision as well as automated quality control (to support scalability and reproducibility, it is adviseable to implement a process that is as quantitative and automated as possible.

Feature extraction Merkmalsextraktion
In this phase, all aspects are summarized that are necessary/relevant for feature extraction, i.e. the derivation of quantitative information from the segmented images using mathematical formulas.

Feature calculation Merkmalsberechnung
The actual process of calculating individual features based on the input data, the formula/algorithm and their parameterization.

X X
Intensity discretization Intensitätsdiskretisierung A discretization/binning of intensities within the ROI is performed to make the calculation of texture features comprehensible and to suppress noise.Binning can be performed for all features or adapted for specific features.

X Preprocessing Vorverarbeitung
The preprocessing steps in this phase are used to prepare the images before feature extraction.In contrast to the aspect "Image Processing" (Phase Image Processing & Segmentation), "Preprocessing" only includes preprocessing steps that are needed for specific features (e.g. a Fourier transformation) but have no general relevance or validity.In the works studied, the preprocessing steps named were i.a.filtering in general, edge reduction or smoothing.

Qualitätskontrolle der berechneten Merkmale
Quality control (e.g. through automatic plausibility check or random checks) of the calculated features.

ROI extraction ROI-Extraktion
Isolation of one or more of the ROIs from the rest of the image (e.g. by replacing excluded pixels/voxels with NaN).This step depends on the feature and the implementation of the extraction method.

Modeling Modellierung This phase contains all aspects that are necessary to establish a model that, based on given input data (radiomic features, clinical features, etc.), allows prediction in terms of the defined prediction goal.
x Adaption of the analysis and modeling strategy

Anpassung der Analyse-und Modellierungsstrategie
Sometimes it can be necessary to adapt the analysis strategy of the study in order to achieve the research goal with the given study data.This aspect covers this necessity, but should be avoided if possible.If one has to adapt the strategy and diverge from the original study design, it should be handled very carefully and statistically double-checked to ensure the validity and integrity of the results.As later changes can introduce bias, over fitting or statistical errors and alike, the results could otherwise be compromised.

X
Dimensionality Reduction Dimensionsreduktion Combination of several features into a new feature (e.g. by Principal Component Analysis).

Exploratory analysis Explorative Analyse
Interactive analysis on the predictive power of different combinations of radiomic features and non-radiomic features.This can be used to perform more targeted feature selection and reduction.It is important that this aspect is not done with the data for testing.

Feature harmonization Merkmalsharmonisierung
Mathematical method to correct batch effects (e.g.locationdependent variations during the image acquisition).In contrast to a harmonization of the gray values (e.g. also by phantom studies, see above), here the harmonization takes place only after the extraction of the features.

Feature selection Auswahl von radiomischen Merkmalen
Selection of radiomic features that are relevant and informative for the planned task, from the extracted radiomic features (e.g.mRMR).Another criterion is the exclusion of non-reproducible radiomic features.

Model building Modellentwicklung
The optimization of a model, to ensure a best possible prediction of the prediction targets based on the selected features.This includes i.a. the parameter optimization of parameterized models (Training) or architectural model/ hyperparameter optimization (Validation).

X
Testing Testen Testing using dedicated data, which is not used for model training, serves the final evaluation of the suitability of the radiomics model (e.g. with regard to robustness/generalizability, predictive quality/accuracy...).Ideally, this is done by means of an independent test set.The testing can also be done by means of cross-validation (whereby it must then be ensured that all aspects mentioned above must be cross-validated, including e.g.nested crossvalidation for hyperparameter optimization).Remark: The term "Testing" was preferred over often found "Validation" because validation in ML normally means the optimization of the architectural model/hyperparameter and "Testing" is semantically fitting term in ML.

Report Report of the results including all necessary metadata (data provenance, data source, processing steps, data quality) as part of publications or the enablement of subsequent usage .
The FAIR principles should be regarded/implemented in the report.

X
Open Legend: #1: Semantic ambiguity: Could be part of "Study design" or partly "Data acquisition".#2: Semantic ambiguity: Could partly be a sub step of "Study design" (planning part) and partly of "Data acquisition" (execution).#3: Semantic ambiguity: Could partly be "Export of imaging data" and partly "Multi disciplinary data curation".#4: Semantic ambiguity: Could partly be "Image processing and Segmentation" and partly "Feature Extraction".#5: Semantic ambiguity: Could partly be "Data conversion" and partly "Image filtering".

Problems related to data sharing
This category contains challenges that originate in a lack of standardization of aspects in the workflow .This category contains challenges that originate in problems with the study design or the way the study is conducted.This category contains challenges that originate in problems with the way the processing is done or the used tooling.Imaging inconsistency, Imaging at multiple time points, Use of not standardized image acquisition protocols, Differences between imaging modalities, Lack of standardization of acquisition parameters, Heterogeneity caused by variations in acquisition parameters, different sequences (MRT), Contrast agent application protocol, used reconstruction kernels) D.2 Problems related to prediction models (e.g.Choosing a suitable algorithm for model building, Underfitting, Overfitting, Lack of generalizability of Radiomics Models, Lack of clinical utility, Lack of reproducibility of prediction models, Lack of standardized performance evaluation) X1 Problem related to uncertainty/trustability of models (e.g.Workflows are not use because the uncertainty of their results can not be investigated; Workflows cannot detect that input is OOD and react/escalate accordingly) D.X2 Lacking workflow integration (e.g.Workflows are not well integrated in current workflows; imposing extra steps or systems that hinder normal workflow; too much time consumption by needed interactions) This category contains challenges that originate in problems with handling the data used in the study (e.g.including additional data not to the correct samples).This category contains challenges that originate in problems with the used radiomics features (e.g.use different radiomics extraction pipelines that have different names for the same features).This category contains challenges that originate in problems with sharing or reusing data for a study (e.g.not sharing data of private data sources makes studies not reproducible).Legal and privacy problems (e.g.open questions regarding model training/sharing and GPDR; Data sharing and GPDR/(broad)consent management; implications of Medical Device Directive if research software is applied prospectively (often not bearable from research groups)) g. the routine data proofs not to be suitable to answer the research question (i.a.due to problems with quantity, quality or contained information); other problems with the data not covered elsewhere)C.3Problems related to study design (e.g.Insufficient size of patient cohorts, Lack of statistical significance, Non-consistent conduct of radiomics studies, High false-positive rates, Class imbalance; Insufficient / irrelevant clinical contribution (problem addressed is not relevant in clinical routine, e.g.radiomics prediction of IDH mutation status in gliomas)) (e.g.Lack of validation of studies, Causality difficult to establish, Overly optimistic results, Problems related to reporting of studies, Insufficient reporting about patient data, Insufficient reporting about prediction models, Insufficient reporting about methodological information) (e.g.Number of subsets, Gaussian filter width for post reconstruction smoothing, unspecified hyperparameters) D.5 Problems related to segmentations (e.g.Differences in segmentation methods / software, Intra-/ Interobservervariability of segmentation, Reproducibility of segmentation methods)